Differences in Corruption Perception Index

Maria Ines Aran - October 2019

 

Corruption Perception Index

The 2018 Corruption Perceptions Index (CPI), published by Transparency International (TI), measures the perceived levels of public sector corruption in 180 countries and territories. Drawing on 13 surveys of businesspeople and expert assessments (sources), the index scores on a scale of zero (highly corrupt) to 100 (very clean).

Each source defines a Corruption Index score for each country that Transparency International normalizes to make it comparable.

What is this analysis about?

The goal of this analysis is to understand the difference in corruption perception across sources.

Data

For this analysis, I used data from Transparency International’s Corruption Perceptions Index 2018.

The 2018 CPI draws on 13 surveys and expert assessments (sources) to measure public sector corruption in 180 countries and territories, giving each a score from zero (highly corrupt) to 100 (very clean).

The table shows the CPI scored by each source for each country in 2018. Plus the average, standard deviation, minimum and maximum CPI for each country across sources.

# Import libraries
library(readxl)
library(tidyverse)
library(Hmisc)
library(knitr)
library(here)
library(gridExtra)
library(plotly)

# Data import
data <- read_excel(here("data","2018_CPI_FullDataSet.xlsx"), sheet = "CPI2018", skip = 2)
data_across_years_sources <- read.csv(here('data','cpi_across_years.csv'))

# Data manipulation
# Subset ordering and data type conversion
x = data %>% select(1,4,7,8,9)
x <- x[order(x$`CPI Score 2018`, decreasing=FALSE),]
x$Country <- factor(x$Country, levels = x$Country[order(x$`CPI Score 2018`, decreasing = FALSE)])

# Create new variable
x$Difference <- x$`Upper CI` - x$`Lower CI`

# Top and bottom of list
x$Country <- as.factor(x$Country)
x.head  <- tail(x, n = 20)
x.tail <- head(x, n = 20)

Analysis

How is CPI distributed?

The average CPI score across countries is 43. More than two-thirds of countries score below 50.

p <- ggplot(x,aes(x=`CPI Score 2018`)) + 
 geom_histogram(aes(y=..density..), colour="black", fill="white")+
 geom_density(alpha=.2, fill="#3695d8") 

p + geom_vline(aes(xintercept=mean(`CPI Score 2018`)),
            color="gold", linetype="dashed", size=1) + theme(panel.background = element_blank())

Countries with highest CPI

The plot shows the average, minimum and maximum CPI scores for each country. Denmark is the country with the highest CPI score (88) with a minimum CPI of 84 and a maximum of 92. Iceland’s CPI is worth noticing as its average CPI is 76 but sources have a wider range of opinions that go from 69 to 83.

library(ggplot2)
plot <- ggplot(x.head, aes(x=Country, y=`CPI Score 2018`)) + 
        geom_pointrange(aes(ymin=`Lower CI`, ymax=`Upper CI`)) +
        geom_point(color='green')+
        ggtitle("Top 20") +
        xlab("CPI Score 2018") + coord_flip()

plot + theme(axis.ticks = element_blank(),
             axis.title.y = element_blank(),
             plot.title = element_text(size=14, hjust = 0.5),
             panel.background = element_blank())

Countries with lowest CPI

The plot shows the CPI score for each country and the minimum and maximum score assigned. Somalia is the country with the lowest CPI score, its minimum CPI is 5 and its highest 15.

library(ggplot2)
plot <- ggplot(x.tail, aes(x=Country, y=`CPI Score 2018`)) + 
        geom_pointrange(aes(ymin=`Lower CI`, ymax=`Upper CI`)) +
        ggtitle("Bottom 20") +
        geom_point(color = 'red')+ 
        xlab("CPI Score 2018") + coord_flip()

plot + theme(axis.ticks = element_blank(),
             axis.title.y = element_blank(),
             plot.title = element_text(size=14, hjust = 0.5),
             panel.background = element_blank())

Analysis of CPI accross sources

What is the country with the biggest CPI spread?

Oman, Comorros and Gambia have the widest difference between its minimum and maximum score. Oman’s CPI is 52 and CI scores range from 36 to 68. Comoros and Gambia both with average CPI’s below 40 have a difference of 30 points between its lower and upper Corruption Index scores.

# Reorder dataframe based on difference
x.difference <- x[order(x$Difference, decreasing=FALSE),]
x.difference$Country <- factor(x.difference$Country, levels = x.difference$Country[order(x.difference$Difference, decreasing = FALSE)])
x.difference.all <- x.difference
x.difference <- tail(x.difference, n = 20)
plot <- ggplot(x.difference, aes(x=Country, y=`Difference`)) + 
        ggtitle("Difference betwen maximum and minimum score") +
        geom_point(color = 'orange', size = 3) +
        geom_text(aes(label=x.difference$Difference),hjust=0, vjust=0) +
        geom_segment( aes(x=Country, xend=Country, y=0, yend=Difference), color="grey", size=0.7)+ 
        coord_flip()

plot + theme(
             axis.ticks = element_blank(),
             axis.title.y = element_blank(),
             plot.title = element_text(size=14, hjust = 0.5),
             panel.background = element_blank()
             )

The Oman case

Oman’s CPI is 52, its minimum CI score was 36 and its maximum 68. This indicates a wide different perception of this country’s corruption in the public sector.

Five sources analyzed Oman, ‘World Economic Forum EOS’ scored it as 87, almost as ‘clean’ as Denmark which CPI is 88. At the same time, the ‘Bertelsmann Foundation Transformation Index’ scored it 21. It is 21 points below the average CPI and is like Zimbabwe and Cambodia levels of corruption.

oman<- data %>% filter(Country =="Oman")
oman<-oman %>% select(10:22)
newdf<-data.frame(t(oman))
newdf <- cbind(Source = rownames(newdf), newdf)
rownames(newdf) <- 1:nrow(newdf)
colnames(newdf)[2] <- "CPI"
oman<-newdf
rm(newdf)

plot <- ggplot(oman, aes(x=Source, y=CPI)) + 
        ggtitle("How sources perceive Oman's corruption?") +
        geom_point(color = 'orange', size = 5) +
        ylab("CPI Score 2018") +  ylim(0, 100)

plot + theme(plot.title = element_text(size=14, hjust = 0.5),
             axis.title.y = element_blank(),
             axis.ticks = element_blank(),
             panel.background = element_blank()
             ) + coord_flip()

On the sources

How do sources score?

Sources around the world score a Corruption Index based on surveys and expert assessments. The aim is to measure public sector corruption in 180 countries by giving each a score that ranges from zero (highly corrupt) to 100 (very clean).

Not all sources provide scores for all countries, some specialize in certain regions. On one hand, ‘Global Insight Country Risk Ratings’ scored all the countries in the list and ‘Varieties of Democracy Project’ scored 95% of the total countries listed. On the other hand, the ‘PERC Asia Risk Guide’ provided scores only for 15 countries.

‘Bertelsmann Foundation Sustainable Governance Index’ is the source that on average assigned the highest scores (66) evaluating 41 countries in the Middle East & North Africa, Sub-Saharan Africa, Western Europe, America, Asia-Pacific y Europe and Central Asia areas. ‘African Development Bank CPIA’ assigned the lowest score on average (28) analyzing 38 countries in Sub-Saharan Africa region.

‘IMD World Competitiveness Yearbook’ and ‘Economist Intelligence Unit Country Ratings’ distributed scores on a larger range of values compared to other sources. Both sources cover countries in all regions but ‘Economist Intelligence Unit Country Ratings’ analyzed twice as many countries as ‘IMD World Competitiveness Yearbook’.

# Subset data
df.sources <- data %>% select(10:22)

# Calculte min,max,avg
colMax <- function(data) sapply(data, max, na.rm = TRUE)
colMin<- function(data) sapply(data, min, na.rm = TRUE)

max<-colMax(df.sources)
min<-colMin(df.sources)
avg<-colMeans(df.sources, na.rm = TRUE)
count<-colSums(!is.na(df.sources))
std<-sapply(df.sources, sd, na.rm = TRUE)

# create dataframe
newdf<-data.frame(as.list(avg),row.names = 'Avg')
newdf<-rbind(newdf, data.frame(as.list(min),row.names = 'Min'))
newdf<-rbind(newdf, data.frame(as.list(max),row.names = 'Max'))
newdf<-rbind(newdf, data.frame(as.list(count),row.names = 'Count'))
newdf<-rbind(newdf, data.frame(as.list(std),row.names = 'Std'))
newdf<-data.frame(t(newdf))
newdf <- cbind(Source = rownames(newdf), newdf)
rownames(newdf) <- 1:nrow(newdf)
newdf$dif_max_min <- newdf$Max - newdf$Min

df.sources <- newdf
rm(newdf)

# order data
df.sources <- df.sources[order(df.sources$Avg, decreasing=FALSE),]
df.sources$Source <- factor(df.sources$Source, levels = df.sources$Source[order(df.sources$Avg, decreasing = FALSE)])
plot <- ggplot(df.sources, aes(x=Source, y=Avg)) + 
        geom_pointrange(aes(ymin=Avg-Std, ymax=Avg+Std)) +
        ggtitle("How sources assign scores?") + ylab("CPI Score 2018") +
        geom_segment( aes(x=Source, xend=Source, y=Avg-Std, yend=Avg+Std), color=ifelse(df.sources$Source %in% c("IMD.World.Competitiveness.Yearbook","Economist.Intelligence.Unit.Country.Ratings"), "#3695d8", "grey"),      size=ifelse(df.sources$Source %in% c("IMD.World.Competitiveness.Yearbook","Economist.Intelligence.Unit.Country.Ratings"), 1.3, 0.7) ) +       geom_point(color = 'orange') 


plot + theme(axis.title.y = element_blank(),
             axis.ticks = element_blank(),
             plot.title = element_text(size=14, hjust = 0.5),
             panel.background = element_blank()
             ) + coord_flip()

Corruption perception differences across sources

For each pair of country and source, the absolute difference between a country’s CPI and CI score assigned by the source was calculated. Then, the differences were sum up for each source divided by countries analyzed. The largest the indicator the biggest the difference in corruption perception between that source and the average.

‘World Economic Forum EOS’ is the source with the biggest difference in corruption perception compared to the average. Followed by ‘Varieties of Democracy Project’.

# Subset dataste
df.sources.differences <- data %>% select(1,4,10:22)
df.sources.differences2 <- df.sources.differences

# Calculating the difference between CPI and each CI for each country and source
c <- df.sources.differences2$`CPI Score 2018`
df.sources.differences2[,3:15] <- (apply(df.sources.differences2[,3:15], 2, function(x) abs(x-c)))
sum.differences.abs <- colSums(df.sources.differences2[,3:15], na.rm = TRUE)
sum.differences.abs <- as.data.frame(sum.differences.abs)
newdf <- cbind(Source = rownames(sum.differences.abs), sum.differences.abs)
rownames(newdf) <- 1:nrow(newdf)
colnames(newdf)[2] <- "Sum_of_distance_to_CPI"
sum.differences.abs<-newdf
rm(newdf)

# Dividing total difference of each source for total countries analyzed by it. 
source.count <- df.sources %>% select(1,5)
source.count$Source <- as.character(source.count$Source)
sum.differences.abs$Source <- as.character(sum.differences.abs$Source)
sum.differences.abs$Source <- gsub('\\s+', '.', sum.differences.abs$Source)

sum.differences.abs <- sum.differences.abs %>% mutate(Source= trimws(as.character(Source)))
source.count <- source.count %>% mutate(Source = trimws(as.character(Source)))
sum.differences.abs.count <- left_join(source.count,sum.differences.abs, by = c('Source'))

sum.differences.abs.count$Unit_difference <-sum.differences.abs.count$Sum_of_distance_to_CPI / sum.differences.abs.count$Count

# Order data
sum.differences.abs.count <- sum.differences.abs.count[order(sum.differences.abs.count$Unit_difference, decreasing=FALSE),]
sum.differences.abs.count$Source <- factor(sum.differences.abs.count$Source, levels = sum.differences.abs.count$Source[order(sum.differences.abs.count$Unit_difference, decreasing = FALSE)])
plot <- ggplot(sum.differences.abs.count, aes(x=Source, y=Unit_difference)) + 
        ggtitle("Corruption perception differences") + ylab("Difference score") +
        geom_point(color = 'orange', size = 5) +  coord_flip()

plot + theme(plot.title = element_text(size=14, hjust = 0.5),
             axis.title.y = element_blank(),
             panel.background = element_blank(),
             axis.ticks = element_blank()
             ) 

Diferences across years and sources

# Function to plot 
plotting <- function(dataset,source){
  data_to_plot<- dataset %>% filter(Source == source)

  p <-ggplot(data=data_to_plot, aes(x=year, y=Cpi, group = Country, color = Region))+ geom_line(size=2)
  p + theme(panel.background = element_blank())
}

# Vector of sources
sources <- pull(unique(data_across_years_sources['Source']), Source)

# Generate plots
i<-0
plots <- list()
for (source in sources){
  i<-i+1
  name <- paste('p_',i, sep = '')
  assign(name, plotting(data_across_years_sources,source))
  plots <- append(plots,name)
}

# Plots
#grid.arrange(p_1,p_2,p_3,p_4,p_5,p_6,p_7,p_8,p_9,p_10,p_11,p_12,p_13,p_14, ncol = 4)
grid.arrange(p_1,p_2, ncol = 2)

plotting <- function(dataset,source){
  data_to_plot<- dataset %>% filter(Source == source)

  p <-ggplot(data=data_to_plot, aes(x=year, y=Cpi, group = Country, color = Region)) +
      geom_line(size=1) +
      ggtitle(source) +
      xlab('Year') +
      ylab('CPI')
  
  p <- p + theme(panel.background = element_blank(),
            plot.title = element_text(hjust = 0.5))
  
  gg <- ggplotly(p)
  
  gg <- style(gg, line = list(color = 'gold'), hoverinfo = 'Source', traces = 1)
  
  gg
  
}

plotting(data_across_years_sources,'World Bank CPIA')

Conclusion

Transparency International (TI) reported the Corruption Perceptions Index (CPI) 2018. It measures the perceived levels of public sector corruption in 180 countries and territories. The index scores on a scale of zero (highly corrupt) to 100 (very clean). It is build based on experts from different sources.

The data of the CPI and Corruption Index scored by each source for each country is available on their website. This information leads to these key findings:

\(~\)

Data Sonification

library(sonify)
## Loading required package: tuneR
oman.cc <- oman[complete.cases(oman), ]
oman.cc$CPI <- as.double(oman.cc$CPI)
oman.cc <- oman.cc[order(oman.cc$CPI),]
obj = sonify(x=t(oman.cc['CPI']), duration=2, play=TRUE, waveform= 'triangle')
denmark<- data %>% filter(Country =="Denmark")
denmark<-denmark %>% select(10:22)
newdf<-data.frame(t(denmark))
newdf <- cbind(Source = rownames(newdf), newdf)
rownames(newdf) <- 1:nrow(newdf)
colnames(newdf)[2] <- "CPI"
denmark<-newdf
rm(newdf)

plot <- ggplot(denmark, aes(x=Source, y=CPI)) + 
        ggtitle("How sources perceive Denmark's corruption?") +
        geom_point(color = 'orange', size = 5) +
        ylab("CPI Score 2018") + ylim(0, 100)

plot + theme(plot.title = element_text(size=14, hjust = 0.5),
             axis.title.y = element_blank(),
             axis.ticks = element_blank(),
             panel.background = element_blank()
             ) + coord_flip()

denmark.cc <- denmark[complete.cases(denmark), ]
denmark.cc$CPI <- as.double(denmark.cc$CPI)
denmark.cc <- denmark.cc[order(denmark.cc$CPI),]
obj = sonify(x=t(denmark.cc['CPI']), duration=2, play=TRUE, waveform= 'triangle')
 

Done by Maria Ines Aran

aranmariaines@gmail.com